31 research outputs found

    Unsupervised Deep Single-Image Intrinsic Decomposition using Illumination-Varying Image Sequences

    Full text link
    Machine learning based Single Image Intrinsic Decomposition (SIID) methods decompose a captured scene into its albedo and shading images by using the knowledge of a large set of known and realistic ground truth decompositions. Collecting and annotating such a dataset is an approach that cannot scale to sufficient variety and realism. We free ourselves from this limitation by training on unannotated images. Our method leverages the observation that two images of the same scene but with different lighting provide useful information on their intrinsic properties: by definition, albedo is invariant to lighting conditions, and cross-combining the estimated albedo of a first image with the estimated shading of a second one should lead back to the second one's input image. We transcribe this relationship into a siamese training scheme for a deep convolutional neural network that decomposes a single image into albedo and shading. The siamese setting allows us to introduce a new loss function including such cross-combinations, and to train solely on (time-lapse) images, discarding the need for any ground truth annotations. As a result, our method has the good properties of i) taking advantage of the time-varying information of image sequences in the (pre-computed) training step, ii) not requiring ground truth data to train on, and iii) being able to decompose single images of unseen scenes at runtime. To demonstrate and evaluate our work, we additionally propose a new rendered dataset containing illumination-varying scenes and a set of quantitative metrics to evaluate SIID algorithms. Despite its unsupervised nature, our results compete with state of the art methods, including supervised and non data-driven methods.Comment: To appear in Pacific Graphics 201

    Fast Optimal Transport Averaging of Neuroimaging Data

    Full text link
    Knowing how the Human brain is anatomically and functionally organized at the level of a group of healthy individuals or patients is the primary goal of neuroimaging research. Yet computing an average of brain imaging data defined over a voxel grid or a triangulation remains a challenge. Data are large, the geometry of the brain is complex and the between subjects variability leads to spatially or temporally non-overlapping effects of interest. To address the problem of variability, data are commonly smoothed before group linear averaging. In this work we build on ideas originally introduced by Kantorovich to propose a new algorithm that can average efficiently non-normalized data defined over arbitrary discrete domains using transportation metrics. We show how Kantorovich means can be linked to Wasserstein barycenters in order to take advantage of an entropic smoothing approach. It leads to a smooth convex optimization problem and an algorithm with strong convergence guarantees. We illustrate the versatility of this tool and its empirical behavior on functional neuroimaging data, functional MRI and magnetoencephalography (MEG) source estimates, defined on voxel grids and triangulations of the folded cortical surface.Comment: Information Processing in Medical Imaging (IPMI), Jun 2015, Isle of Skye, United Kingdom. Springer, 201

    Fast Modal Sounds with Scalable Frequency-Domain Synthesis

    Get PDF
    International audienceAudio rendering of impact sounds, such as those caused by falling objects or explosion debris, adds realism to interactive 3D audiovisual applications, and can be convincingly achieved using modal sound synthesis. Unfortunately, mode-based computations can become prohibitively expensive when many objects, each with many modes, are impacted simultaneously. We introduce a fast sound synthesis approach, based on short-time Fourier Tranforms, that exploits the inherent sparsity of modal sounds in the frequency domain. For our test scenes, this "fast mode summation" can give speedups of 5-8 times compared to a time-domain solution, with slight degradation in quality. We discuss different reconstruction windows, affecting the quality of impact sound "attacks". Our Fourier-domain processing method allows us to introduce a scalable, real-time, audio processing pipeline for both recorded and modal sounds, with auditory masking and sound source clustering. To avoid abrupt computation peaks, such as during the simultaneous impacts of an explosion, we use crossmodal perception results on audiovisual synchrony to effect temporal scheduling. We also conducted a pilot perceptual user evaluation of our method. Our implementation results show that we can treat complex audiovisual scenes in real time with high quality

    MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

    Get PDF
    We introduce a method to convert stereo 360{\deg} (omnidirectional stereo) imagery into a layered, multi-sphere image representation for six degree-of-freedom (6DoF) rendering. Stereo 360{\deg} imagery can be captured from multi-camera systems for virtual reality (VR), but lacks motion parallax and correct-in-all-directions disparity cues. Together, these can quickly lead to VR sickness when viewing content. One solution is to try and generate a format suitable for 6DoF rendering, such as by estimating depth. However, this raises questions as to how to handle disoccluded regions in dynamic scenes. Our approach is to simultaneously learn depth and disocclusions via a multi-sphere image representation, which can be rendered with correct 6DoF disparity and motion parallax in VR. This significantly improves comfort for the viewer, and can be inferred and rendered in real time on modern GPU hardware. Together, these move towards making VR video a more comfortable immersive medium.Comment: 25 pages, 13 figures, Published at European Conference on Computer Vision (ECCV 2020), Project Page: http://visual.cs.brown.edu/matryodshk

    Perceptual quality of BRDF approximations: dataset and metrics

    Get PDF
    International audienceBidirectional Reflectance Distribution Functions (BRDFs) are pivotal to the perceived realism in image synthesis. While measured BRDF datasets are available, reflectance functions are most of the time approximated by analytical formulas for storage efficiency reasons. These approximations are often obtained by minimizing metrics such as L 2 —or weighted quadratic—distances, but these metrics do not usually correlate well with perceptual quality when the BRDF is used in a rendering context, which motivates a perceptual study. The contributions of this paper are threefold. First, we perform a large-scale user study to assess the perceptual quality of 2026 BRDF approximations, resulting in 84138 judgments across 1005 unique participants. We explore this dataset and analyze perceptual scores based on material type and illumination. Second, we assess nine analytical BRDF models in their ability to approximate tabulated BRDFs. Third, we assess several image-based and BRDF-based (Lp, optimal transport and kernel distance) metrics in their ability to approximate perceptual similarity judgments

    Primal Heuristics for Wasserstein Barycenters

    No full text
    This paper presents primal heuristics for the computation of Wasserstein Barycenters of a given set of discrete probability measures. The computation of a Wasserstein Barycenter is formulated as an optimization problem over the space of discrete probability measures. In practice, the barycenter is a discrete probability measure which minimizes the sum of the pairwise Wasserstein distances between the barycenter itself and each input measure. While this problem can be formulated using Linear Programming techniques, it remains a challenging problem due to the size of real-life instances. In this paper, we propose simple but efficient primal heuristics, which exploit the properties of the optimal plan obtained while computing the Wasserstein Distance between a pair of probability measures. In order to evaluate the proposed primal heuristics, we have performed extensive computational tests using random Gaussian distributions, the MNIST handwritten digit dataset, and the Fashion MNIST dataset introduced by Zalando. We also used Translated MNIST, a modification of MNIST which contains original images, rescaled randomly and translated into a larger image. We compare the barycenters computed by our heuristics with the exact solutions obtained with a commercial Linear Programming solver, and with a state-of-the-art algorithm based on Gaussian convolutions. Our results show that the proposed heuristics yield in very short run time and an average optimality gap significantly smaller than 1%
    corecore